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Abstract 

This technical report presents a characterization of the power consumption of the Itsy Pocket 
Computer Version 1.5 [1], a state-of-the art pocket computer developed by Compaq Computer 
Corporation's Palo Alto Labs. This characterization seeks to is to identify specific architectural 
features of the Itsy Pocket Computer that the operating system and applications can use to 
reduce total energy consumption. A secondary goal is to gather background data that can be 
used to explain application-specific energy usage. 

This report examines the power and energy cost of: running the system at between 59 MHz 
and 206 MHz, reading and writing data (with and without enabling the MMU, data cache, 
and write buffer), reading and writing flash memory, flushing the instruction and data caches, 
enabling the UART, transmitting data over a serial line at numerous baud rates, and enabling 
and disabling the LCD. 

This report also presents the DRAM access times and bandwidths supported by the Itsy Pocket 
Computer Architecture, as a function of the processor clock speed. 

Some of the material presented in this technical note is discussed in a paper authored by Keith 
I. Farkas, Jason Flinn, Godmar Back, Dirk Grunwald, and Jennifer Anderson, which will appear 
in the Proceedings of the ACM SIGMETRICS 2000 International Conference on Measurement 
and Modeling of Computer Systems. 
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Figure 1: Precision resistors for current measurement (Itsy Version 1.5) 



1 Background 

This study characterizes the power consumption of the Itsy Pocket Computer [1] with a series of 
micro-benchmarks. The main goal is to identify specific architectural features that the operating 
system and applications can use to reduce total energy consumption. A secondary goal is to gather 
background data that can be used to explain application-specific energy usage. 

This report also presents the DRAM memory access times and bandwidths of the Itsy Pocket 
Computer Architecture as a function of the processor speed. This data is given in the Appendix A. 

Some of the material presented in this technical note is discussed in a paper authored by Keith 
I. Farkas, Jason Flinn, Godmar Back, Dirk Grunwald, and Jennifer Anderson, which will appear in 
the Proceedings of the ACM SIGMETRICS 2000 International Conference on Measurement and 
Modeling of Computer Systems. 



1.1 Measurement Methodology 

In this report, we examine the power and energy consumption of the Itsy Pocket Computer Version 
1.5 and its components. This version of the Itsy contains two power domains, a 3.3 Volt domain, 
PWR33, and a 1.5 Volt domain, PWR15. The PWR15 domain powers the Itsy's microprocessor, 
which is a 200 MHz StrongARM SA-1100 [2]. The PWR33 domain powers all the other components. 
More information on the architecture of the Itsy Pocket Computer Version 1.5 is provided in the 
Itsy Pocket Computer Version 1.5 Hardware Description Manual, which is available as part of the 
Itsy Version 1.5 Hardware Specifications. See [3] for more information. 

To measure the power consumed by each of these domains and the power supplied to the Itsy 
(the input power), the following procedure is used. First, consider the input power. 

The input power at a time T is equal to the product of the current flowing into the Itsy at 
time T from a power source (e.g., a battery or power supply) and the voltage this current induces. 
The input current is measured indirectly by measuring the differential voltage across a small valued 
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resistor in the main current path. This resistor is labeled r"o3 in Figure 1 ; this figure shows the two 
power domains, and the three resistors provided for measuring current. Given that this resistor is 
a 20 mfi ± 1% resistor, the current of interest, I r o3, is equal to ^j§, where V r o3 is the differential 
voltage across r"o3. The voltage induced by 7 r o3 is the voltage between test points fo3 an d ioi, that 
is, voltage V pwr i n . 

The instantaneous power consumed by each of the two power domains is similarly measured 
and calculated. Thus, six voltage measurements are required to measure both the input power and 
the power consumption of each power domain. 

The data reported here was obtained by measuring these six voltages using six differential 
amplifiers and a data acquisition (DAQ) system, while the Itsy was powered by an external voltage 
supply. The amplifiers were employed to minimize the error introduced into our measurements by 
electro-magnetic noise, and the limited precision of the DAQ system. The output of each amplifier 
was connected to one of the analog inputs of the DAQ system. The DAQ system was in turn 
connected to a workstation, which initiated the measurement acquisition process and recorded the 
results. 

The micro-benchmarks used in this work were designed to exhibit a constant load on the Itsy 
for a period of several seconds. During this time, the DAQ system was used to measure each of 
the six voltages, one at a time, as the DAQ system could only measure one channel at a time. 
The data reported here for each benchmark represents the average of these readings, with the first 
and last set of six-voltage measurements excluded. The first and last measurement were excluded 
because they can be inaccurate, owing to the sequential measuring of the six voltages. The power 
measurements from two successive trials of the same benchmark were found to differ by as much 
as 2 mW. Note that while this methodology is suitable for measuring the power usage of the tasks 
reported on here, it is not suitable for capturing more dynamic power usage, such as that from 
booting of an operating system. 

The duration of each benchmark was measured using the StrongARM SA-1100's OS Timer 
Count Register (OSCR), which runs off the 3.6864 MHz oscillator. Since this register is not accurate 
when the processor is switching clock frequencies, care was taken to avoid measurements which 
spanned changes in the clock frequency. 

Each benchmark was run directly on top of the hardware using the Itsy Monitor [3] (version 
1.3). Therefore, no OS overhead is included in these results. 

2 Clock-Rate Micro-benchmarks 
2.1 Description 

These micro-benchmarks examine the effect of the core clock frequency of the processor on Itsy 
power usage. The frequency of the SA-1100 core clock can be varied from 59.0 MHz to 206.4 MHz. 
Operations performed at lower clock frequencies generally consume less power, but require more 
time to complete. Therefore, one of the goals of these benchmarks is to examine whether reducing 
the core clock frequency can reduce energy usage. 

Additionally, the SA-1100 supports a clock-switching mode, in which the core clock frequency 
is normally twice the speed at which the external (system) bus is run. If a read miss occurs, the 
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core clock frequency drops to the bus speed until the needed data is read from memory. If clock 
switching is disabled, then the core clock frequency is always the same as the bus speed. For 
example, if clock-switching is disabled and the core clock-frequency is set to 206.4 MHz, then both 
the core and bus operate at 103.2 MHz. 

For each benchmark described below, the SA-1100 core clock frequency is varied from 59.0 MHz 
to 206.4 MHz. Unless otherwise noted, all configurable hardware components are disabled, with 
the exception of the DRAM banks, static memory, and the instruction cache. 



Sleep - measures power usage while the Itsy is in sleep mode. The benchmark stops the 3.6864 
MHz oscillator and places the Itsy in sleep mode. 



• Idle - measures power usage while the Itsy is in idle mode. The benchmark sets the clock 
speed and disables clock-switching before entering idle mode. 

• Wait - measures power usage while the Itsy executes a busy- wait loop. The benchmark sets 
an OS match timer (OSMR0) to expire in five seconds, then continuously polls the status 
register until the timer expires. Clock-switching is enabled for this benchmark. 

• Add - measures power usage for a compute-intensive application. The benchmark performs 
300 million additions within a small loop, so that no DRAM accesses are required within the 
inner loop. Clock-switching is enabled for this benchmark. 

• Add/NS - measures power usage for a compute-intensive application when clock-switching is 
disabled. As a result, the processor core always runs at the bus speed. With the exception 
of disabling clock-switching the benchmark is identical to the Add benchmark. 

2.2 Data 

• Figure 2 : Input Power Consumption for Clock-Rate Micro-benchmarks 

• Figure 3: Main (3.3 V) Power Consumption for Clock-Rate Micro-benchmarks 

• Figure 4: Core (1.5 V) Power Consumption for Clock-Rate Micro-benchmarks 

• Figure 5 : Duration of Clock-Rate Micro-benchmarks 

• Figure 6 : Energy Consumption of Clock-Rate Micro-benchmarks 

• Figure 7: Power consumption when the SA-1100's core voltage was lowered from 1.5 V to 
1.23 V. 



WRL Technote 56 



7 



Power and Energy Characterization of Itsy Version 1.5 



Clock Freq. 
(MHz) 


Power 


W) 


Sleep 


Idle 


Wait 


Add 


Add/NS. 


59.0 


0.010 


0.092 


0.225 


0.314 


0.209 


73.7 




0.099 


0.262 


0.375 


0.244 


88.5 




0.106 


0.300 


0.434 


0.276 


103.2 




0.113 


0.338 


0.494 


0.312 


118.0 




0.119 


0.375 


0.553 


0.343 


132.7 


0.010 


0.126 


0.412 


0.612 


0.376 


147.5 




0.133 


0.450 


0.670 


0.410 


162.2 




0.140 


0.487 


0.728 


0.447 


176.9 




0.147 


0.523 


0.785 


0.475 


191.7 




0.154 


0.560 


0.843 


0.511 


206.4 


0.011 


0.161 


0.596 


0.899 


0.541 



Figure 2: Input Power Consumption for Clock-Rate Micro-benchmarks 
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Figure 3: Main (3.3 V) Power Consumption for Clock-Rate Micro-benchmarks 
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Figure 4: Core (1.5 V) Power Consumption for Clock-Rate Micro-benchmarks 
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Figure 5: Duration of Clock-Rate Micro-benchmarks 
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Figure 6: Energy Consumption of Clock-Rate Micro-benchmarks 



2.3 Discussion 

As expected, sleep mode power usage does not depend upon the initial clock-rate. However, the 
power usage in idle mode varies significantly as the initial clock-rate changes. This variation is due 
to the power consumed by the microprocessor's components that still run while the it is in idle 
mode. This observation suggests that the core-clock frequency should be reduced whenever it is 
likely that the Itsy will remain idle for a significant period of time. This check could be performed 
in the Linux 1 idle procedure before entering idle mode. 

Disappointingly, reducing the clock-frequency appears to produce no significant energy savings 
in other circumstances. Performing a given set of operations at a lower clock-frequency consumes 
less power, but also takes longer to complete. These two effects compete against each other, 
producing approximately equal energy consumption. The main reason for this behavior is that the 
voltage supplied to the SA-1100 is not reduced at the same time as is the clock frequency. These 
results do indicate that significant power savings might be achieved with a chip that allowed a 
variable voltage supply, since the energy usage is roughly equivalent without any voltage variation. 
Further, in this study we assume only ideal battery behavior, but as Tom Martin showed in his 
Ph.D. dissertation [4], lowering clock-frequency can be beneficial if the effects of non-ideal battery 
behavior are considered. 

To understand better the power reduction that is achieved with a voltage reduction, we modified 
the Itsy Pocket Computer so that we could run the SA-1100 at either 1.5 Volts or 1.23 Volts ; Figure 7 
lists the power consumption of the Itsy and the processor for three of the micro-benchmarks at 
these two power levels. Clearly there is a power saving when running at a lower core voltage. Note 
that for the "busy wait" benchmark when the LCD was turned on, the SA-1100 would not operate 
properly at 206 MHz and 1.23 Volts. 

As expected, disabling clock-switching increases the energy used to execute a tight loop. The 

x The Linux operating system has been ported to the Itsy Pocket Computers. 
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Figure 7: Power consumption when the SA-1100's core voltage was lowered from 1.5 V to 1.23 V. 



time to complete the loop doubles, but the corresponding power savings is smaller. 
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3 Memory Micro-benchmarks 
3.1 Description 

These micro-benchmarks examine energy usage while reading from and writing to DRAM. The 
read benchmark executes a large number of load instructions inside of a tight loop that has been 
unrolled sixteen times. The write benchmark executes a large number of stores in a similar loop. 
In each benchmark, 100 MB of data is read/written. Each address in memory is read or written 
many times. 

Both the read and write benchmarks are executed with two different patterns of data access. 
When the in-cache pattern is used, all load (or store) instructions hit in the data cache. When 
the out-of-cache pattern is used, all load (or store) instructions miss in the data cache. 

Each micro-benchmark is executed with the following memory-management options : 

• IMWD - The instruction cache, MMU, write-buffer and data cache are all enabled. 

• IMW - The instruction cache, MMU, and write-buffer are enabled. The data cache is disabled. 

• IM - The instruction cache and MMU are enabled. The data cache and write-buffer are 
disabled. 

• I - Only the instruction cache is enabled. All data is addressed physically. 

For each of scenario, the core clock frequency was varied from 59 MHz to 206.4 MHz. In 
addition, each benchmark was executed with clock-switching enabled and with switching disabled. 



3.2 Data 

• Figure 8 : Input Power Consumption of Memory Micro-benchmarks with Clock-Switching 
Enabled 

• Figure 9 : Input Power Consumption of Memory Micro-benchmarks with Clock Switching 
Disabled 

• Figure 10: Main (3.3V) Power Consumption of Memory Micro-benchmarks with Clock- 
Switching Enabled 

• Figure 11: Main (3.3V) Power Consumption of Memory Micro-benchmarks with Clock 
Switching Disabled 

• Figure 12: Core (1.5V) Power Consumption of Memory Micro-benchmarks with Clock- 
Switching Enabled 

• Figure 13 : Core (1.5V) Power Consumption of Memory Micro-benchmarks with Clock Switch- 
ing Disabled 

• Figure 14 : Duration of Memory Micro-benchmarks with Clock-Switching Enabled 
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Figure 8: Input Power Consumption of Memory Micro-benchmarks with Clock-Switching Enabled 



• Figure 15 : Duration of Memory Micro-benchmarks with Clock Switching Disabled 

• Figure 16 : Energy Consumption of Memory Micro-benchmarks with Clock Switching Enabled 

• Figure 17: Energy Consumption of Memory Micro-benchmarks with Clock-Switching Dis- 
abled 

3.3 Discussion 

Virtual addressing (turning on the MMU) does not incur any noticeable cost, in terms of energy 
or performance, when compared to physical addressing. The data collected with the MMU and 
instruction cache enabled is virtually identical to the data collected with only the instruction cache 
enabled for all benchmarks. 



WRL Technote 56 



13 



Power and Energy Characterization of Itsy Version 1.5 



Clock 


Data 

1 J Cli Li Cli 


Power (W) 


Freq. 


Locality 


Read 


Write 


(MHz) 




IMWD 


IMW 


IM 


I 


IMWD 


IMW 


IM 


I 


59.0 


In Cache 


0.245 


0.458 


0.460 


0.457 


0.238 


0.588 


0.552 


0.546 


59.0 


Out of Cache 


0.443 


0.473 


0.475 


0.474 


0.585 


0.588 


0.552 


0.546 


73.7 


In Cache 


0.287 


0.535 


0.537 


0.534 


0.281 


0.690 


0.652 


0.645 


73.7 


Out of Cache 


0.500 


0.554 


0.557 


0.554 


0.687 


0.690 


0.652 


0.645 


88.5 


In Cache 


0.329 


0.582 


0.584 


0.580 


0.323 


0.793 


0.753 


0.747 


88.5 


Out of Cache 


0.547 


0.602 


0.604 


0.602 


0.790 


0.793 


0.753 


0.747 


103.2 


In Cache 


0.371 


0.653 


0.655 


0.651 


0.366 


0.897 


0.852 


0.846 


103.2 


Out of Cache 


0.602 


0.676 


0.677 


0.676 


0.893 


0.897 


0.852 


0.846 


118.0 


In Cache 


0.413 


0.722 


0.723 


0.719 


0.407 


0.880 


0.950 


0.943 


118.0 


Out of Cache 


0.655 


0.748 


0.750 


0.746 


0.878 


0.880 


0.947 


0.943 


132.7 


In Cache 


0.455 


0.748 


0.750 


0.746 


0.449 


0.968 


1.047 


1.039 


132.7 


Out of Cache 


0.695 


0.775 


0.776 


0.774 


0.965 


0.968 


1.044 


1.039 


147.5 


In Cache 


0.496 


0.814 


0.816 


0.811 


0.491 


0.955 


1.056 


1.048 


147.5 


Out of Cache 


0.690 


0.844 


0.845 


0.843 


0.950 


0.955 


1.056 


1.048 


162.2 


In Cache 


0.538 


0.879 


0.880 


0.875 


0.532 


1.034 


1.146 


1.138 


162.2 


Out of Cache 


0.698 


0.911 


0.911 


0.909 


1.031 


1.034 


1.142 


1.138 


176.9 


In Cache 


0.578 


0.895 


0.897 


0.892 


0.573 


1.004 


1.120 


1.113 


176.9 


Out of Cache 


0.730 


0.927 


0.926 


0.925 


1.000 


1.004 


1.116 


1.113 


191.7 


In Cache 


0.620 


0.955 


0.956 


0.951 


0.615 


1.004 


1.123 


1.116 


191.7 


Out of Cache 


0.771 


0.991 


0.990 


0.987 


1.000 


1.004 


1.119 


1.116 


206.4 


In Cache 


0.660 


0.961 


0.963 


0.958 


0.655 


1.059 


1.187 


1.179 


206.4 


Out of Cache 


0.762 


0.997 


0.995 


0.994 


1.055 


1.059 


1.183 


1.179 



Figure 9: Input Power Consumption of Memory Micro-benchmarks with Clock Switching Disabled 
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Figure 10: Main (3.3V) Power Consumption of Memory Micro-benchmarks with Clock-Switching 
Enabled 
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Figure 11: Main (3.3V) Power Consumption of Memory Micro-benchmarks with Clock Switching 
Disabled 
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Figure 12: Core (1.5V) Power Consumption of Memory Micro-benchmarks with Clock-Switching 
Enabled 
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Figure 13: Core (1.5V) Power Consumption of Memory Micro-benchmarks with Clock Switching 
Disabled 



Enabling the write buffer incurs a significant energy penalty for the write benchmarks. With 
clock-switching enabled, the benchmark is performed slightly faster with the write buffer enabled, 
but power consumption is much higher. One possible explanation for this behavior may be that 
when the benchmark fills all available write buffer entries, the processor stalls but the core clock 
frequency does not drop to the speed of the bus. Indeed, when clock-switching is disabled, the 
write benchmark uses significantly less energy. In fact, with clock-switching disabled, enabling the 
write buffer reduces energy consumption. 

Enabling the data cache provides the expected benefit when reads and write hit in the cache. 
When reads and writes miss in the cache, more energy is consumed with the data cache enabled. 
This is mostly due to the longer time needed to perform the benchmark. However, the out-of-cache 
benchmark exhibits pathologically bad cache behavior that may not occur under any realistic 
workload. 
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Figure 14: Duration of Memory Micro-benchmarks with Clock-Switching Enabled 
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Figure 15: Duration of Memory Micro-benchmarks with Clock Switching Disabled 
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Figure 16: Energy Consumption of Memory Micro-benchmarks with Clock Switching Enabled 
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Figure 17: Energy Consumption of Memory Micro-benchmarks with Clock-Switching Disabled 
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Since enabling the data cache requires that the write-buffer also be enabled, the data cache 
experiments exhibit some of the poor write-benchmark performance noticed previously. When 
writes miss in the cache, disabling clock-switching reduces energy consumption. However, when 
writes hit in the cache, disabling clock-switching incurs an energy penalty. This behavior suggests 
that it may be beneficial to disable clock-switching before performing a large block of writes that 
are known to have poor data locality. This idea is examined further in the next section, which 
benchmarks a large data copy. 
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Figure 18: Input Power Consumption of Copy Micro-benchmark 

4 Copy Micro-benchmark 
4.1 Description 

This micro-benchmark examines the effect of disabling clock-switching while copying a large block 
of data. The benchmark copies a 128 KB block of data from one memory location to another 
using a copy implementation virtually identical to the Linux kernel memcpy implementation. The 
benchmark performs this copy 1000 times. 

Two scenarios were measured, one in which clock-switching was enabled and the other in which 
clock-switching was disabled. In both cases, the instruction cache, MMU, data cache, and write- 
buffer are all enabled. 
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4.3 Discussion 

By disabling clock-switching, the total energy used to copy 128 KB of data can be reduced by 
almost 27%. At the same time, copy performance decreases by less than 0.3%. This indicates 
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Figure 19: Main (3.3 V) Power Consumption of Copy Micro-benchmark 
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Figure 20: Core (1.5 V) Power Consumption of Copy Micro-benchmark 
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Figure 21: Duration of Copy Micro-benchmark 
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Figure 22: Energy Consumption of Copy Micro-benchmark 



WRL Technote 56 



26 



Power and Energy Characterization of Itsy Version 1.5 



that energy savings may be achieved by inserting additional logic in common routines such as 
memcpy, memset, bcopy, and bzero to disable clock-switching if a large number of writes are to 
be performed. Further exploration is needed to develop appropriate heuristics to determine when 
disabling clock-switching is beneficial. 
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5 Flash Micro-benchmarks 
5.1 Description 

These micro-benchmarks examine energy consumption when reading or writing flash memory. The 
read benchmark executes a large number of load instructions inside of a tight loop that has been 
unrolled sixteen times. It reads the same amount of data as the DRAM read benchmark (100 MB), 
so that the results of the two benchmarks can be compared. Two read scenarios are examined, one 
in which each load hits in the data cache (which is enabled), and one in which each load misses 
in the data cache. Read performance is shown for different clock frequencies, as well as with clock 
switching enabled and disabled. 

The write benchmark programs a IK region of flash 50 times. The amount of data written is 
significantly less than for both the flash read benchmark and also the DRAM write benchmark. 
Therefore, the results of these benchmarks should not be directly compared. Two write modes are 
examined, one in which the data being written is merged with the existing data stored in the flash 
memory, and one in which the data in flash is ignored. 
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5.3 Discussion 

When data accesses miss in the data cache, reducing the clock frequency has a small benefit when 
reading data from flash. Reading data from flash consumes less power than reading data from 
DRAM, but executes two to three times slower. Therefore, reading from flash consumes more 
energy (approximately 18%) than reading from DRAM. This suggests that data in flash which is 
read fairly often could be copied to DRAM to save energy. However, data that is read only a few 
times may be best left in flash. 

Writing flash is extremely slow compared to writing DRAM. Although clock frequency does not 
appear to affect energy usage, disabling clock-switching while writing flash yields a small energy 
benefit. 



WRL Technote 56 



28 



Power and Energy Characterization of Itsy Version 1.5 



Clock 
Freq. 
(MHz) 


Power 


(W) 


Read - 


Tn (larnp 

111 CliV^ll^ 


Read - Out of Cache 


Write 




Write 


_ Ignore 


Switching 


No Switching 


Switching 


No Switching 


Switching 


No Switching 


Switching 


No Switching 


59.0 


0.396 


0.249 


0.318 


0.313 


0.561 


0.511 


0.559 


0.504 


73.7 


0.476 


0.294 


0.322 


0.318 


0.564 


0.513 


0.555 


0.507 


88.5 


0.555 


0.338 


0.326 


0.323 


0.560 


0.511 


0.556 


0.508 


103.2 


0.634 


0.382 


0.371 


0.367 


0.558 


0.515 


0.557 


0.508 


118.0 


0.711 


0.425 


0.369 


0.367 


0.561 


0.509 


0.560 


0.510 


132.7 


0.788 


0.469 


0.372 


0.370 


0.562 


0.514 


0.553 


0.502 


147.5 


0.868 


0.512 


0.406 


0.403 


0.567 


0.508 


0.560 


0.511 


162.2 


0.940 


0.555 


0.405 


0.404 


0.575 


0.513 


0.560 


0.507 


176.9 


1.020 


0.597 


0.407 


0.405 


0.561 


0.503 


0.558 


0.507 


191.7 


1.089 


0.640 


0.435 


0.434 


0.561 


0.508 


0.559 


0.508 


206.4 


1.162 


0.682 


0.435 


0.434 


0.567 


0.509 


0.561 


0.507 



Figure 23: Input Power Consumption of Flash Micro-benchmarks 
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Figure 24: Main (3.3V) Power Consumption of Flash Micro-benchmarks 
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Figure 25: Core (1.5V) Power Consumption of Flash Micro-benchmarks 
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Figure 26: Duration of Flash Micro-benchmarks 
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Figure 27: Energy Consumption of Flash Micro-benchmarks 
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Figure 28: Input Power Consumption for UART Mode Micro-benchmarks 



6 UART Mode Micro-benchmarks 

6.1 Description 

These micro-benchmarks examine the effect of the UART mode on the Itsy's power usage. The 
next set of micro-benchmarks examine power usage while the UART is transmitting data. 

The UART mode benchmarks are executed with all configurable hardware components disabled, 
with the exception of the DRAM banks, static memory, and instruction cache. For each benchmark, 
power usage was measured while the SA-1100 was in idle mode, and while the processor was 
executing a busy- wait loop. The three UART modes measured are: 

• Disabled - Note that this benchmark is identical to the idle and busy-wait clock-rate bench- 
marks. It is included here for reference. 

• Auto - measures power usage while the UART is in auto-shutdown mode. In this mode, the 
UART is enabled while it is connected to another serial port (e.g, that of a workstation), and 
disabled otherwise. No data is transmitted during this benchmark, but the serial port of the 
UART is connected to the serial port of a workstation. 

• Enabled - measures power usage while the UART is enabled. No data is transmitted during 
this benchmark, but the Itsy is connected to the serial port of a workstation. 

6.2 Data 

• Figure 28 : Input Power Consumption for UART Mode Micro-benchmarks 

• Figure 29: Main (3.3V) Power Consumption for UART Mode Micro-benchmarks 

• Figure 30: Core (1.5V) Power Consumption for UART Mode Micro-benchmarks 
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Figure 29: Main (3.3V) Power Consumption for UART Mode Micro-benchmarks 
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Figure 30: Core (1.5V) Power Consumption for UART Mode Micro-benchmarks 
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6.3 Discussion 

These results show that the energy costs of enabling the UART is fairly constant at approximately 
44 mW. If auto-shutdown mode is employed and the UART is connected to a serial port, the 44m W 
cost is incurred even when no data is being transmitted. It would be interesting to see if the UART 
uses any energy in auto-shutdown mode when it is not connected to a serial port. 
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Figure 31: Input Power Consumption for UART Rate Micro-benchmarks 



Clock Rate 
(MHz) 


Power (W) 


9600 


19200 


38400 


57600 


115200 


59.0 
132.7 
206.4 


0.081 
0.081 
0.081 


0.085 
0.085 
0.085 


0.095 
0.093 
0.093 


0.104 
0.102 
0.101 


0.134 
0.128 
0.127 



Figure 32: Main (3.3V) Power Consumption for UART Rate Micro-benchmarks 

7 UART Rate Micro-benchmarks 
7.1 Description 

This micro-benchmark measures the effect of using different data rates to transmit data using the 
Itsy UART to/from a workstation over a serial connection. The benchmark transmits 100,000 bytes 
of data from the Itsy to the host computer. All supported data rates between 9,600 and 115,200 
baud were measured for three different clock frequencies. All other configurable hardware compo- 
nents are disabled, with the exception of the DRAM banks, static memory, and the instruction 
cache. 



7.2 



Data 
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Energy Consumption of UART Rate Micro-benchmarks 



7.3 Discussion 

No energy benefit is achieved by reducing the UART transmission rate (in fact, the opposite is true 
since the decrease in power consumption does not offset the increased transmission time). However, 
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Figure 33: Core (1.5V) Power Consumption for UART Rate Micro-benchmarks 
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Figure 34: Duration of UART Rate Micro-benchmarks 
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Figure 35: Energy Consumption of UART Rate Micro-benchmarks 
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reducing the clock frequency before beginning transmission can significantly reduce energy usage. 
This may be related to the results of the idle mode benchmark discussed previously, since the Itsy 
is placed in idle mode when it is not transmitting or receiving data. It may be that a solution that 
changes the clock frequency before entering idle mode could also help reduce energy consumption 
when the UART is transmitting data. 
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Figure 36: Input Power Consumption for LCD Micro-benchmarks 



8 LCD Micro-benchmarks 

8.1 Description 

These micro-benchmarks examine the effect of the LCD display power usage. Note that the Itsy 
uses a reflective, passive matrix display with no back light. The benchmarks measure power usage 
for both the main and auxiliary LCD controllers. Where applicable, the power used to display 
multiple images is measured. Each benchmark was performed with the SA-1100 in idle mode, and 
while executing a busy- wait loop. Also, three different clock-rates were used. 

Unless otherwise noted, all configurable hardware components are disabled, with the exception 
of the DRAM banks, flash memory, and the instruction cache. 

• Disabled - measures power usage while both the main and auxiliary controllers are disabled. 
Note that this is identical to the idle and busy-wait clock-rate benchmarks. 

• Auxiliary - measures power usage while the auxiliary LCD controller is used to display an 
image in black and white. Power usage is measured for the standard grey-scale and astronaut 
images. 

• Enabled - measures power usage while the main LCD controller is enabled. Power usage is 
measured for the standard grey-scale and astronaut images. 

8.2 Data 

• Figure 36 : Input Power Consumption for LCD Micro-benchmarks 

• Figure 37: Main (3.3V) Consumption for LCD Micro-benchmarks 

• Figure 38: Core (1.5V) Power Consumption for LCD Micro-benchmarks 
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Figure 37: Main (3.3V) Consumption for LCD Micro-benchmarks 
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Figure 38: Core (1.5V) Power Consumption for LCD Micro-benchmarks 
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8.3 Discussion 

There appears to be considerable variation in the energy consumption of the LCD controllers. 
When the SA-1100 is in idle mode, enabling the main LCD controller consumes an additional 36-41 
mW when the grey-scale image is displayed and an additional 47-52 mW when the astronaut image 
is displayed. When a busy-wait loop is executed, the LCD controller uses an additional 28-36 mW 
when the grey-scale image is displayed and an additional 38-47 mW when the astronaut image is 
displayed. 

The auxiliary LCD controller uses a surprising amount of power. In idle mode, the auxiliary 
controller uses 25 mW to display the grey-scale image and 21-22 mW to display the astronaut 
image. During a busy-wait, the auxiliary controller uses 20-22 mW to display the grey-scale image 
and 18-22 mW to display the astronaut image. 
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9 Cache Flush Micro-benchmarks 

9.1 Description 

This set of micro-benchmarks measure the cost of flushing data from the instruction and data 
caches. These measurements include both the cost of the actual flush operation, as well as the 
additional cost necessary to reload data into the cache. For the data cache, it is also necessary 
to write possibly dirty data back to memory before the flush is performed, and this cost is also 
included in the measurements. 

The cost of flushing the cache will depend on the usage patterns of the particular application 
performing the flush. These benchmarks are structured to assume that most flushed data will 
need to be reloaded. Therefore, they are a worse-case scenario for the cost of flushing the cache. 
Applications that do not reload flushed data may see a considerably smaller energy penalty. The 
three cache flush operations examined are : 

• Instruction Cache Flush - This benchmark flushes the instruction cache, then reloads ap- 
proximately 8K of instructions into the cache. This scenario is compared with a warm-cache 
scenario in which the 8K of instructions are executed in a loop without flushing the cache. 

• Data Cache Flush - This benchmark writes the contents of the data cache to memory (which 
is necessary to write back any dirty cache blocks), invalidates all blocks in the data cache, 
and then reloads the cache by reading data from memory. This scenario is compared with a 
warm-cache scenario in which the cache is not flushed. 

• Data Block Flush - This benchmark writes a word back to memory, invalidates the data cache 
block, and reloads it from memory. Like the previous benchmarks, the cost of the cache flush 
is measured by comparison with a scenario in which the flush is not performed. 

Each benchmark measures the time to perform the actual flush (including the time needed to 
write dirty data back to memory). The benchmarks also measure the cost of flushing the cache 
by comparing a cold-cache scenario (in which the cache is flushed) with an equivalent warm-cache 
scenario (in which the cache is not flushed). 

9.2 Data 

• Figure 39 : Cost of Flushing Data and Instruction Caches 

9.3 Discussion 

The cost of an instruction cache flush is very application-dependent, since the actual flush takes 
approximately 1 /xs., and consequently consumes only a small amount of energy. A potentially 
greater cost in both time and power occurs when instruction data must be reloaded. As the 
benchmark shows, in a worse-case scenario, an instruction cache flush incurs a performance penalty 
of 79 /xs. and 48 /xJ. 



WRL Technote 56 



41 



Power and Energy Characterization of Itsy Version 1.5 



Benchmark 


Flush 


Time 


(/xs) 


Power 


(mW) 


Energy (/xJ) 




Time (/xs) 


Warm 


Cold 


Warm 


Cold 


Warm 


Cold 


Instruction Cache Flush 


1 


129 


208 


804 


730 


104 


152 


Data Cache Flush 


20 


51 


457 


950 


768 


48 


351 


Data Line Flush 


<1 


0.019 


0.068 


902 


827 


0.017 


0.056 



Figure 39: Cost of Flushing Data and Instruction Caches 



The cost of a data cache flush is also somewhat application-dependent. It takes approximately 
20 /xs. to clean and flush the cache. However, reloading the cache can be expensive. In a worse-case 
scenario, this incurs a penalty of 493 /xs. and 303 /xJ. 

Flushing a line from the data cache is very quick, and consequently incurs little penalty (0.039 
/xJ), even if that line is immediately reread from memory. 
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A Memory Access Times and Bandwidths 

This appendix presents the DRAM memory access times and bandwidths supported by the Itsy 
Pocket Computer Architecture, as a function of the bus clock frequency. This data is valid for 
version 1.5, version 2.1, and version 2.2 of the Itsy Pocket Computer. 

Table 1 gives the data for 50-nano-second EDO DRAMs, while Table 2 gives the data for 45- 
nano-second EDO DRAMs. Note that the tables show all possible frequencies of the SA-1100, even 
those beyond specifications. 

Memory settings and access speed are different if all the DRAM is on the mother-board than if 
there is some DRAM on a memory expansion daughter-card. Note that in the latter case, access 
to the mother-board DRAM banks are as slow as to the daughter-card banks (i.e., all banks are 
accessed at the speed of the slowest one) . 

Access time and bandwidth are shown both for single-word accesses and for 8-word burst ac- 
cesses (cache line fill). 
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Table 1: 50-ns EDO DRAMs 
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I | DRAM on Mother-board only | | DRAM on Daughter-Card | | 
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bit 


1 
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bit 




1 1 
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1 


8 x 32 


bit 




1 1 
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1 1 


Time | 
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. 1 1 


Time | 


Bandw. | 


Time | 


Bandw 


. 1 1 


[MHz] 


1 1 


[ns] | 
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1 


[MB/ 


's] 1 1 


[ns] | 


[MB/ 


's] | 


[ns] | 


[MB/ 


's] 1 1 





■++ 


+■ 




— +. 




— +. 




— 


-++■ 


+■ 




— +. 


+. 




— 


-++ 


59.0 


1 1 


152.6 | 


26. 


2 I 


627. 


3 I 


51 . 


0 


1 1 


152.6 | 


26. 


2 I 


627.3 | 


51 . 


0 


1 1 


73.7 


1 1 


122.1 | 


32. 


8 I 


501. 


8 I 


63. 


8 


1 1 


149.2 | 


26. 


8 I 


529.0 | 


60. 


5 


1 1 


88.5 


1 1 


124.3 | 


32. 


2 I 


440. 


8 I 


72. 


6 


1 1 


124.3 | 


32. 


2 I 


440.8 | 


72. 


6 


1 1 


103.2 


1 1 


106.6 | 


37. 


5 I 


377. 


8 I 


84. 


7 


1 1 


106.6 | 


37. 


5 I 


377.8 | 


84. 


7 


1 1 


118.0 


1 1 


93.2 | 


42. 


9 1 


330. 


6 I 


96. 


8 


1 1 


93.2 | 


42. 


9 1 


330.6 | 


96. 


8 


1 1 


132.7 


1 1 


82.9 | 


48. 


3 I 


293. 


9 1 


109. 


0 


1 1 


82.9 | 


48. 


3 I 


293.9 | 


109. 


0 


1 1 


147.5 


1 1 


81.4 | 


49. 


2 I 


271. 


3 I 


118. 


0 


1 1 


94.9 | 


42. 


1 1 


284.8 | 


112. 


0 


1 1 


162.2 


1 1 


86.3 | 


46. 


3 I 


258. 


9 1 


124. 


0 


1 1 


86.3 | 


46. 


3 I 


302.1 | 


106. 


0 


1 1 


176.9 


1 1 


84.8 | 


47. 


2 I 


282. 


6 I 


113. 


0 


1 1 


84.8 | 


47. 


2 I 


282.6 | 


113. 


0 


1 1 


191.7 


1 1 


78.3 | 


51. 


1 1 


260. 


8 I 


123. 


0 


1 1 


83.5 | 


47. 


9 1 


302.6 | 


106. 


0 


1 1 


206.4 


1 1 


77.5 | 


51. 


6 I 


247. 


0 1 


130. 


0 


1 1 


82.3 | 


48. 


6 I 


285.8 | 


112. 


0 


1 1 




■++ 






— +. 




— +. 






-++■ 






— +. 








-++ 


221.2 


1 1 


76.9 | 


52. 


0 1 


266. 


7 I 


120. 


0 


1 1 


85.9 | 


46. 


6 I 


275.8 | 


116. 


0 


1 1 


235.9 


1 1 


80.5 | 


49. 


7 I 


258. 


6 I 


124. 


0 


1 1 


84.8 | 


47. 


2 I 


292.5 | 


109. 


0 


1 1 


265.4 


1 1 


79.1 | 


50. 


6 I 


263. 


7 I 


121. 


0 


1 1 


82.9 | 


48. 


3 I 


293.9 | 


109. 


0 


1 1 


294.9 


1 1 


78.0 | 


51. 


3 I 


267. 


9 1 


119. 


0 


1 1 


84.8 | 


47. 


2 I 


274.7 | 


117. 


0 


1 1 


309.7 


1 1 


80.7 | 


49. 


5 I 


261. 


6 I 


122. 


0 


1 1 


84.0 | 


47. 


6 I 


287.4 | 


111. 


0 


1 1 


324.4 


1 1 


77.1 | 


51. 


9 1 


249. 


7 I 


128. 


0 


1 1 


80.1 | 


49. 


9 1 


274.3 | 


117. 


0 


1 1 


353.9 


1 1 


76.3 | 


52. 


4 I 


254. 


3 I 


126. 


0 


1 1 


84.8 | 


47. 


2 I 


282.6 | 


113. 


0 


1 1 


383.4 


1 1 


75.6 | 


52. 


9 1 


258. 


2 I 


124. 


0 


1 1 


80.9 | 


49. 


5 I 


281.7 | 


114. 


0 


1 1 



Table 2: 45-ns EDO DRAMs 
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